Video Game Sales DataViz

  • Created by Andrés Segura Tinoco
  • Created on Mar 17, 2020

Visual Analytics project to analyze and discovery insights of video game sales in recent years, with the high-level API plotly.express.

In [1]:
# Load the Pandas libraries
import pandas as pd
In [2]:
# Load Plot libraries
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

Loading raw data

The first step is to load the dataset into a pandas dataframe. Here you can see the dataset.

In [3]:
dataURL = "../data/vgsales_2020.csv"
raw_data = pd.read_csv(dataURL)
In [4]:
raw_data
Out[4]:
Rank Name Platform Year Genre Publisher Developer Global_Sales
0 1 Wii Sports Wii 2006 Sports Nintendo NaN 82.74
1 2 Super Mario Bros. NES 1985 Platform Nintendo NaN 40.24
2 3 Mario Kart Wii Wii 2008 Racing Nintendo NaN 35.82
3 4 Wii Sports Resort Wii 2009 Sports Nintendo NaN 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo NaN 31.37
... ... ... ... ... ... ... ... ...
17806 17807 This is the Police NS 2017 Simulation THQ Nordic Weappy Studio 0.00
17807 17808 Starry Sky: Summer Stories PSV 2017 Adventure honeybee honeybee 0.00
17808 17809 Kingdom Hearts HD I.5 + II.5 ReMIX PS4 2017 Action Square Enix Square Enix 0.00
17809 17810 DOOM VFR PC 2017 Shooter Bethesda Softworks id Software 0.00
17810 17811 Marvel vs. Capcom: Infinite PC 2017 Fighting Capcom Capcom 0.00

17811 rows × 8 columns

Now the basic statistics of the numeric fields are shown, to have a quick understanding of the behavior of the data.

In [5]:
raw_data.describe()
Out[5]:
Rank Year Global_Sales
count 17811.000000 17811.000000 17811.000000
mean 8906.000000 2007.319409 0.518108
std 5141.737158 6.370235 1.522937
min 1.000000 1980.000000 0.000000
25% 4453.500000 2004.000000 0.050000
50% 8906.000000 2008.000000 0.160000
75% 13358.500000 2011.000000 0.450000
max 17811.000000 2020.000000 82.740000

1. Video Games Sales per Release Year

Important Note: Sales will be grouped by release year of game, not by real sale date as we don't have such historical data.

In [6]:
# Total sales
gd_sales = raw_data.groupby(["Year"]).sum()
gd_sales.reset_index(inplace=True)
In [7]:
# Plot global trend
fig = px.line(gd_sales, x="Year", y="Global_Sales")
fig.add_shape(dict(type="line", x0=2008, y0=0, x1=2008, y1=700, line=dict(color="RoyalBlue", width=2, dash="dot")))
fig.update_layout(height=400)
fig.update_xaxes(title_text="Release Year")
fig.update_yaxes(title_text="# Global Sales")
fig.show()

Insights

  • It is clearly observed that the release year with the subsequent highest sales of video games was 2008. Keep in mind that for many of the games released later, sales have not been reported or are still selling.

2. Top 50 Best-Selling Video Games

Now we can plot the top 50 best-selling video games in the world.

In [8]:
# Data
top_games = 50
raw_data.head(10)
Out[8]:
Rank Name Platform Year Genre Publisher Developer Global_Sales
0 1 Wii Sports Wii 2006 Sports Nintendo NaN 82.74
1 2 Super Mario Bros. NES 1985 Platform Nintendo NaN 40.24
2 3 Mario Kart Wii Wii 2008 Racing Nintendo NaN 35.82
3 4 Wii Sports Resort Wii 2009 Sports Nintendo NaN 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo NaN 31.37
5 6 Tetris GB 1989 Puzzle Nintendo NaN 30.26
6 7 New Super Mario Bros. DS 2006 Platform Nintendo NaN 30.01
7 8 Wii Play Wii 2006 Misc Nintendo NaN 29.02
8 9 New Super Mario Bros. Wii Wii 2009 Platform Nintendo NaN 28.62
9 10 Duck Hunt NES 1984 Shooter Nintendo NaN 28.31
In [9]:
# Plot the best-selling video games, colored by Publisher
fig = px.bar(raw_data.head(top_games), x = 'Global_Sales', y = 'Name', color='Publisher', 
             orientation='h', hover_data=["Platform"])
fig.update_layout(yaxis={'categoryorder':'total ascending'}, showlegend=True)
fig.update_layout(height=800, title_text="Top 50 Best-Selling Video Games")
fig.update_xaxes(title_text="# Global Sales")
fig.update_yaxes(title_text="")
fig.show()

Insights

  • The best-selling game is Wii Sports with approximately 80M copies. Keep in mind, that this game came by default with the first Wii console.
  • The second best-selling game is GTA V, on 3 different platforms: PS3, PS4, Xbox 360.
  • Also, the best-selling video game company is Nintendo (Royal Blue color).

3. Video Game Sales grouped by Platform

In [10]:
# Grouped data
gd = raw_data.groupby(['Platform', 'Publisher']).sum()
gd.reset_index(inplace=True)
gd.head(10)
Out[10]:
Platform Publisher Rank Year Global_Sales
0 2600 20th Century Fox Video Games 28367 9907 1.94
1 2600 Activision 106590 49571 18.38
2 2600 Answer Software 4135 1982 0.50
3 2600 Atari 180490 83266 41.30
4 2600 Avalon Interactive 8709 1982 0.17
5 2600 Bomb 7529 1982 0.22
6 2600 CBS Electronics 5981 1982 0.31
7 2600 CPG Products 3865 1982 0.54
8 2600 Coleco 22120 9906 3.06
9 2600 Data Age 10978 3963 0.71
In [11]:
# Plot Video Game Sales grouped by Platform
fig = px.treemap(gd, path=['Platform', 'Publisher'], values='Global_Sales')
fig.show()

Insights

  • The 4 platforms that have contributed the most to video game sales are: PS2, PS3, X360 and Wii.
  • Closely followed by Nintendo DS and PS (1).

4. Video Game Sales grouped by Publisher

In [12]:
# Plot Video Game Sales grouped by Publisher
fig = px.treemap(gd, path=['Publisher', 'Platform'], values='Global_Sales')
fig.show()

Insights

  • Regarding the biggest publishers of video games, Nintendo clearly wins.
  • Other great cross-platform publishers are: Electronic Arts, Activision and Ubisoft.

Important Note: Sales will be grouped by release year of game, not by real sale date as we don't have such historical data.

In [13]:
top_companies = 10
In [14]:
# Top 10 Companies
gd = raw_data.groupby(['Publisher']).sum()
gd = gd.sort_values(by='Global_Sales', ascending=False)
top_companies = list(gd.head(top_companies).index)
top_companies
Out[14]:
['Nintendo',
 'Electronic Arts',
 'Activision',
 'Sony Computer Entertainment',
 'Ubisoft',
 'Take-Two Interactive',
 'THQ',
 'Konami Digital Entertainment',
 'Sega',
 'Namco Bandai Games']
In [15]:
# Grouped data
gd = raw_data[raw_data["Publisher"].isin(top_companies)].groupby(['Year', 'Publisher']).sum()
gd = gd.sort_values(by='Year', ascending=True)
gd.reset_index(inplace=True)
gd.head(10)
Out[15]:
Year Publisher Rank Global_Sales
0 1980 Activision 21002 3.02
1 1981 Activision 17470 8.50
2 1982 Activision 20377 1.86
3 1982 Sega 5016 0.40
4 1983 Activision 13433 1.94
5 1983 Nintendo 7662 10.96
6 1984 Activision 6621 0.27
7 1984 Namco Bandai Games 6032 3.43
8 1984 Nintendo 9240 45.56
9 1985 Activision 20230 0.48

Plotting Sales Trends of Top 10 Publishers

In [16]:
fig = px.line(gd, x="Year", y="Global_Sales", color='Publisher')
fig.update_layout(title_text="Sales Trends of Top 10 Publishers")
fig.update_xaxes(title_text="Release Year")
fig.update_yaxes(title_text="# Global Sales")
fig.show()

This multi-line chart confirms the insights obtained in point 4.

6. Distribution of Video Game Sales

Regarding Platform and Genre from 2013.

In [17]:
# Parallel Categories Diagram
fig = px.parallel_categories(raw_data.query("Year >= 2013"), dimensions=["Platform", "Genre"])
fig.show()

Insights

  • As of 2013, the majority of video games sold were of the genres: Action, Role-Playing (RPG) and Adventure, closely followed by the genres: Sports and Shooter.
  • Regarding the platforms that sold the most video games as of 2013, PlayStation (PS3, PS4 and PSV) is the clear winner, contributing approximately 40% of video games sold.
  • The participation of games by gender is practically distributed uniformly by platform.

7. Evolution of Video Game Sales by Genre

In [18]:
# Top 10 Companies
gd = raw_data.groupby(['Year', 'Genre']).sum()
gd.reset_index(inplace=True)
gd
Out[18]:
Year Genre Rank Global_Sales
0 1980 Action 5595 0.34
1 1980 Fighting 2749 0.77
2 1980 Misc 17658 2.71
3 1980 Shooter 839 7.07
4 1980 Sports 4202 0.49
... ... ... ... ...
434 2020 Racing 45427 0.11
435 2020 Role-Playing 30389 1.64
436 2020 Simulation 6123 0.29
437 2020 Sports 14764 0.03
438 2020 Strategy 8977 0.15

439 rows × 4 columns

In [19]:
# Parallel Categories Diagram
fig = px.area(raw_data, x='Year', y='Global_Sales', color='Genre')
fig.update_layout(title_text="Evolution of Video Game Sales by Genre")
fig.update_xaxes(title_text="Release Year")
fig.update_yaxes(title_text="# Global Sales")
fig.show()

Video Game Sales by Genre by Decades

In [20]:
# Cook the data
gd_1980s = raw_data.query("Year>=1980 and Year<1990")[["Genre", "Global_Sales"]].groupby(['Genre']).sum()
gd_1980s.reset_index(inplace=True)

gd_1990s = raw_data.query("Year>=1900 and Year<2000")[["Genre", "Global_Sales"]].groupby(['Genre']).sum()
gd_1990s.reset_index(inplace=True)

gd_2000s = raw_data.query("Year>=2000 and Year<2010")[["Genre", "Global_Sales"]].groupby(['Genre']).sum()
gd_2000s.reset_index(inplace=True)

gd_2010s = raw_data.query("Year>=2010")[["Genre", "Global_Sales"]].groupby(['Genre']).sum()
gd_2010s.reset_index(inplace=True)
In [21]:
# Create subplots: use 'domain' type for Pie subplot
labels = ["US", "China", "European Union", "Russian Federation", "Brazil", "India", "Rest of World"]
fig = make_subplots(rows=2, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=gd_1980s["Genre"], values=gd_1980s["Global_Sales"], name="1980s", title="1980s", hole=.3), 1, 1)
fig.add_trace(go.Pie(labels=gd_1990s["Genre"], values=gd_1990s["Global_Sales"], name="1990s", title="1990s", hole=.3), 1, 2)
fig.add_trace(go.Pie(labels=gd_2000s["Genre"], values=gd_2000s["Global_Sales"], name="2000s", title="2000s", hole=.3), 2, 1)
fig.add_trace(go.Pie(labels=gd_2010s["Genre"], values=gd_2010s["Global_Sales"], name="2010s", title="2010s", hole=.3), 2, 2)
fig.update_layout(height=800, title_text="Video Game Sales by Genre by Decades")
fig.show()

Insights

Finally, how the video game genre trend has changed in each decade:

  • In the 1980s, Platform games were the most played, with 32.5%, now (in the 2010s) they only represent 4.86%.
  • Instead, the Action genre is the most famous now with a 25.4% share.
  • The 1990s was one of the most balanced, where RPG games had more prominence (12.1%) and Platform games began to decline.
  • In the 2000s it was when eSport games had their biggest boom, representing 17.6% of the gaming market.